Search CORE

57 research outputs found

Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

Author: Minghim Rosane
Trust Paul
Younis Haseeb
Publication venue
Publication date: 12/07/2022
Field of study

Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications

arXiv.org e-Print Archive

LDPP at the FinNLP-2022 ERAI task: Determinantal point processes and variational auto-encoders for identifying high-quality opinions from a pool of social media posts

Author: Minghim Rosane
Trust Paul
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 21/02/2023
Field of study

Social media and online forums have made it easier for people to share their views and opinions on various topics in society. In this paper, we focus on posts discussing investment related topics. When it comes to investment , people can now easily share their opinions about online traded items and also provide rationales to support their arguments on social media. However, there are millions of posts to read with potential of having some posts from amateur investors or completely unrelated posts. Identifying the most important posts that could lead to higher maximal potential profit (MPP) and lower maximal loss for investment is not a trivial task. In this paper, propose to use determinantal point processes and variational autoencoders to identify high quality posts from the given rationales. Experimental results suggest that our method mines quality posts compared to random selection and also latent variable modeling improves improves the quality of selected posts

Cork Open Research Archive

The role of habitat features in a primary succession

Author: Coimbra Danilo Barbosa
Martins Rafael Messias
Minghim Rosane
Telea A. C.
Publication venue: Universidade dos Açores
Publication date: 01/01/2007
Field of study

In order to determine the role of habitat features in a primary succession on lava domes of Terceira Island (Azores) we addressed the following questions: (1) Is the rate of cover development related to environmental stress? (2) Do succession rates differ as a result of habitat differences? One transect, intercepting several habitats types (rocky hummocks, hollows and pits, small and large fissures), was established from the slope to the summit of a 247 yr old dome. Data on floristic composition, vegetation bioarea, structure, demography and soil nutrients were collected. Quantitative and qualitative similarities among habitats were also analyzed. Cover development and species accumulation are mainly dependent on habitat features. Habitat features play a critical role in determining the rate of succession by providing different environmental conditions that enable different rates of colonization and cover development. Since the slope’s surface is composed of hummocks, hollows and pits the low succession rates in these habitats are responsible for the lower rates of succession in this geomorphologic unit, whereas the presence of fissures in the dome’s summit accelerates its succession rate

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Repositório da Universidade dos Açores

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Dissertations of the University of Groningen

Visual analysis of interactive document clustering streams

Author: Cabral Eric M.
Evangelos E. Milios
Minghim Rosane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2020
Field of study

Interactive clustering techniques play a key role by putting the user in the clustering loop, allowing her to interact with document group abstractions instead of full-length documents. It allows users to focus on corpus exploration as an incremental task. To explore Information Discovery's incremental aspect, this article proposes a visual component to depict clustering membership changes throughout a clustering iteration loop in both static and dynamic data sets. The visual component is evaluated with an expert user and with an experiment with data streams

Irish Universities

Cork Open Research Archive

GGNN@Causal News Corpus 2022: Gated graph neural networks for causal event classification from social-political news articles

Author: Milios Evangelos
Minghim Rosane
Provia Kadusabe
Trust Paul
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/12/2022
Field of study

The discovery of causality mentions from text is a core cognitive concept and appears in many natural language processing (NLP) applications. In this paper, we study the task of Event Causality Identification (ECI) from social-political news. The aim of the task is to detect causal relationships between event mention pairs in text. Although deep learning models have recently achieved a state-of-the-art performance on many tasks and applications in NLP, most of them still fail to capture rich semantic and syntactic structures within sentences which is key for causality classification. We present a solution for causal event detection from social-political news that captures semantic and syntactic information based on gated graph neural networks (GGNN) and contextualized language embeddings. Experimental results show that our proposed method outperforms the baseline model (BERT (Bidirectional Embeddings from Transformers) in terms of f1-score and accuracy

Cork Open Research Archive

Explaining neighborhood preservation for multidimensional projections

Author: Martins Rafael Messias
Minghim Rosane
Telea A. C.
Publication venue: London
Publication date: 01/01/2015
Field of study

Dimensionality reduction techniques are the tools of choice for exploring high-dimensional datasets by means of low-dimensional projections. However, even state-of-the-art projection methods fail, up to various degrees, in perfectly preserving the structure of the data, expressed in terms of inter-point distances and point neighborhoods. To support better interpretation of a projection, we propose several metrics for quantifying errors related to neighborhood preservation. Next, we propose a number of visualizations that allow users to explore and explain the quality of neighborhood preservation at different scales, captured by the aforementioned error metrics.We demonstrate our exploratory views on three real-world datasets and two state-of-the-art multidimensional projection techniques.São Paulo Research Foundation (FAPESP) (grant 2012/07722-9)CAPES–NUFFIC 028/1

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Universidade de São Paulo

Dissertations of the University of Groningen

Exploring Multidimensional Projections Through Explanatory Maps

Author: Messias Martins Rafael
Minghim Rosane
Rauber Paulo Eduardo
Rodrigues Oliveira da Silva Renato
Telea Alexandru
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings - University of Groningen

UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification

Author: Kadusabe Provia
Minghim Rosane
Omala Kizito
Trust Paul
Zahran Ahmed
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 20/10/2022
Field of study

The paper describes our submissions for the Social Media Mining for Health (SMM4H) workshop 2022 shared tasks. We participated in 2 tasks: (1) classification of adverse drug events (ADE) mentions in english tweets (Task-1a) and (2) classification of self-reported intimate partner violence (IPV) on twitter (Task 7). We proposed an approach that uses RoBERTa (A Robustly Optimized BERT Pretraining Approach) fine-tuned with a label distribution-aware margin loss function and post-hoc posterior calibration for robust inference against class imbalance. We achieved a 4% and 1 % increase in performance on IPV and ADE respectively when compared with the traditional fine-tuning strategy with unweighted cross-entropy loss

Cork Open Research Archive

UNLPSat TextGraphs-16 Natural Language Premise Selection task: Unsupervised Natural Language Premise Selection in mathematical text using sentence-MPNet

Author: Kadusabe Provia
Milios Evangelos
Minghim Rosane
Trust Paul
Younis Haseeb
Zahran Ahmed
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 20/10/2022
Field of study

This paper describes our system for the submission to the TextGraphs 2022 shared task at COLING 2022: Natural Language Premise Selection (NLPS) from mathematical texts. The task of NLPS is about selecting mathematical statements called premises in a knowledge base written in natural language and mathematical formulae that are most likely to be used to prove a particular mathematical proof. We formulated this task as an unsupervised semantic similarity task by first obtaining contextualized embeddings of both the premises and mathematical proofs using sentence transformers. We then obtained the cosine similarity between the embeddings of premises and proofs and then selected premises with the highest cosine scores as the most probable. Our system improves over the baseline system that uses bag of words models based on term frequency inverse document frequency in terms of mean average precision (MAP) by about 23.5% (0.1516 versus 0.1228)

Cork Open Research Archive